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Abstract 



Current low-latency anonymity systems use complex 
overlay networks to conceal a user's IP address, intro- 
ducing significant latency and network efficiency penalties 
compared to normal Internet usage. Rather than obfuscat- 
ing network identity through higher level protocols, we 
propose a more direct solution: a routing protocol that 
allows communication without exposing network identity, 
providing a strong foundation for Internet privacy, while 
allowing identity to be defined in those higher level proto- 
cols where it adds value. 

Given current research initiatives advocating "clean 
slate " Internet designs, an opportunity exists to design an 
internetwork layer routing protocol that decouples identity 
from network location and thereby simplifies the anonymity 
problem. Recently, Hsiao et al. proposed such a protocol 
(LAP), but it does not protect the user against a local 
eavesdropper or an untrusted ISP, which will not be ac- 
ceptable for many users. Thus, we propose Dovetail, a next- 
generation Internet routing protocol that provides anony- 
mity against an active attacker located at any single point 
within the network, including the user's ISP. A major design 
challenge is to provide this protection without including an 
application-layer proxy in data transmission. We address 
this challenge in path construction by using a matchmaker 
node (an end host) to overlap two path segments at a 
dovetail node (a router). The dovetail then trims away 
part of the path so that data transmission bypasses the 
matchmaker. Additional design features include the choice 
of many different paths through the network and the joining 
of path segments without requiring a trusted third party. We 
develop a systematic mechanism to measure the topological 
anonymity of our designs, and we demonstrate the privacy 
and efficiency of our proposal by simulation, using a model 
of the complete Internet at the AS-level. 



1. Introduction 

Our society has experienced a dramatic increase in the 
extent to which daily life is conducted online, with socializ- 
ing, shopping, learning, and banking via the Internet now an 
accepted norm. However, parallel advances in technology 
have enabled widespread tracking, storage, and correlation 
of our online activities, and business models have evolved 
for companies to monetize the user data they collect [1], 
[2]. Taken together, these factors mean that Internet privacy 
has become a pressing issue, and one that we argue could 
benefit from technological solutions. 

When we use the Internet, a wide range of identifying 
information is commonly revealed, but one of the hardest 
forms of identity to remove is that defined by the network 
routing protocol (layer 3), since this identity is used to 
deliver data. In today's Internet, IP is the primary layer 3 
protocol and IP addresses are in every data packet. Record- 
ing a user's IP address can allow an adversary to uniquely 
identify her, link that identity with her online activity, cor- 
relate connections to different services, and partially reveal 
her geographical and network locations. Previous work has 
proposed low-latency anonymity systems to conceal a user's 
identity [3], [4], including her IP address. Tor in particular 
has been adopted by hundreds of thousands of privacy- 
concious users worldwide [5]. Current anonymity systems, 
however, work by creating an overlay network on top of the 
layer 3 protocol, requiring a sequence of IP transmissions 
to disguise the original sender. This sequential forwarding 
and the queueing and processing required in intermediary 
nodes can incur substantial latency and network efficiency 
penalties. 

We prefer an alternative formulation for this problem: 
Rather than attempting to conceal a global layer 3 identifier 
by adding complexity in application protocols, we believe 
that the layer 3 protocol should not reveal a global identity. 
Instead, we leave identity management to higher layers 
in the protocol stack, in only those applications where it 
provides mutual benefit. 



While privacy by itself is unlikely to motivate a change 
away from IP routing, a range of additional concerns 
have emerged within the networking field [6], including 
scalability, security, mobility, challenged environments, and 
network management, leading to major research initiatives 
investigating "clean slate" Internet designs [7]-[9] that 
could be used to build the next-generation Internet (NGI). A 
wide range of different NGI routing concepts have already 
been proposed as a result of these activities [10]— [16]. 
Network virtualization research, showcased in testbeds such 
as GENI [17], offers hope for a progressive transition to 
a future routing protocol. These initiatives in NGI provide 
an opportunity to imagine anonymous communications that 
do not rely on an overlay network. 

We thus propose Dovetail, an NGI routing protocol 
that prevents association of source and destination by an 
attacker located at any fixed point within the network. 
Recently, Hsiao et al. proposed LAP, a lightweight NGI 
anonymity protocol [18]. Unlike LAP, however, Dovetail 
provides protection against observation by local eavesdrop- 
pers and by an untrusted ISP, which is a critical requirement 
for many privacy-conscious users. 

A major design challenge is to provide this protection 
without including an inline application-layer entity (i.e. a 
proxy) in data transmission, which would be much slower 
than only traversing routers. We address this challenge 
in path construction by asking a matchmaker node (an 
end host) to put together two path segments so that they 
overlap at a dovetail node (a router), and enabling the 
dovetail to trim away part of the data forwarding path to 
remove the matchmaker. This technique is implemented 
using public-key operations only at the source and the 
matchmaker, while routers use only symmetric encryption 
and decryption of short header fields and a simple hash 
chain. The protocol enables the choice of many different 
paths through the network and does not require a trusted 
third party. 

In brief, our key contributions are: (1) a novel privacy- 
preserving NGI routing protocol, (2) a systematic mech- 
anism for measuring anonymity in terms of topological 
identity, and (3) evaluation of our protocol in terms of 
topological anonymity using an Internet-scale simulation. 

The remainder of this paper is structured as follows: 
Section 2 introduces our objectives and the adversary we 
design against. Section 3 discusses both source-controlled 
routing and low-latency anonymity systems, including two 
systems that we build upon. Section 4 presents the de- 
sign of Dovetail in a sequence of increasingly detailed 
perspectives, from the broad design down to the packet 
structure. Section 5 analyzes the security of our system by 
considering potential attacks and our defenses, and derives 
the information available to a passive attacker. Section 6 de- 
scribes our evaluation of the protocol. Section 7 concludes 



with a summary of our findings and recommendations for 
further work. 

2. Objectives 

In this section, we describe the goals of the system we 
intend to deliver and the attacker we design against. 

2.1. Anonymity Objectives 

We refer to the party who initiates a connection as the 
source and the opposite party as the destination, although 
data is able to pass in both directions once the connection 
is established. Using the terminology of Pfitzmann and 
Hansen [19], we aim to provide unlinkability between the 
source and destination, such that no network location is 
able to sufficiently distinguish whether the source and 
destination are related, except for the source itself. This 
implies that network locations with good information on 
the source identity have little information on the destination 
identity, and vice versa. Throughout our work, we con- 
strain ourselves to the identifying properties defined at the 
network layer: network identity and network location, or 
topographical anonymity [18]. 

We do not protect the packet contents, which reside 
in higher network layers and are thus out of scope for 
this paper. Content should be protected end-to-end using a 
protocol such as IKEv2, which protects sender and receiver 
identities [20]. Such protection is effectively mandatory 
for strong anonymity protections, as many other forms 
of Internet identification exist, such as device fingerprint- 
ing [21] and persistent cookies [22]. Additionally, higher- 
level protocols like IKEv2 should be used with restricted 
options and implementations to limit the possibility of 
fingerprinting. 

2.2. Performance & Practicality Objectives 

Any anonymity system must offer acceptable perfor- 
mance in order to gain widespread adoption and thus 
provide a large set of potential message sources [23]. Per- 
formance problems with Tor have been widely discussed, 
and they are considered an important factor limiting its 
adoption [24], [25]. We aim to provide a lightweight system 
where all communication for an established connection 
remains within the core networking infrastructure and 
occurs at layer 3. This avoids the frequently slow last 
mile connections [26] in overlay anonymity systems and 
also the queuing required to move between layers in the 
protocol stack. Finally, we require that our system provides 
mechanisms to trade anonymity for performance. 

Another key to widespread adoption is recruiting service 
providers. Our work targets a future Internet, so Dovetail 



need only compete with other future routing protocols 
rather than motivate service providers to switch away from 
IPv4. Today's ISP business models may not apply, but it 
is unlikely that service providers are willing to spend sub- 
stantial time and infrastructure for privacy. Our goal is to 
ensure that costs for service providers are limited, such that 
benefits for privacy-aware consumers are enough incentive 
to participate in the protocol. To this end, we recognize that 
Internet routers have high throughput and low computing 
resources per flow, so we limit cryptographic operations 
and avoid maintenance of any per-connection routing state. 
Additionally, our design does not require significant extra 
traffic and does not violate basic notions of consumer- 
provider relationships that exist in today's Internet. 

2.3. Attack Model 

Selecting an attack model for anonymity systems is 
a challenging task in its own right, as the adversary 
may be different for different users and its capabilities 
are not known in advance. A few key points guide our 
decision. First, protecting a low-latency connection from 
an adversary who can observe traffic at multiple points of 
the network is very difficult. Tor uses layered encryption 
and fixed packet sizes to prevent trivial linkability, but 
this comes with significant expense and does not hide 
traffic patterns, which are linkable with a small chance 
of error [27]. Adding sufficient delays and cover traffic to 
mask traffic patterns is expensive and can be undermined 
by manipulating the patterns [28], [29]. Thus, like both Tor 
and LAP we do not aim to protect against these attacks. 
Second, users may be suspicious of any service provider 
that can link them with their Internet activities. This applies 
to anonymity service providers, such as Anonymizer.com, 
and also to Internet service providers. ISPs have proved 
to not be fully trustworthy with private browsing data [30], 
[31]. We therefore aim to prevent any element of our system 
from being able to deanonymize users. Third, a user's local 
communication may be subject to eavesdropping, e.g. at a 
wireless hotspot or by an employer. Unlike LAP we aim 
to protect against such adversaries. Fourth, many of the 
adversaries that we aim to protect against would be capable 
of various active attacks, such as replay or packet header 
manipulation, so we also aim to limit the exposure that 
such attacks might cause. 

We thus consider an adversary who is active but local. 
Active means the adversary is able to initiate connections 
and to violate the rules of the protocol for the connections 
in which she is involved, in addition to passively monitoring 
these connections. We define local as confined to a single 
Autonomous System (AS) within the Internet. ASes are 
the level at which routing information and policies are 
commonly shared, so a compromise in security at one 



router may affect multiple routers controlled by the same 
AS. In contrast, in order to span multiple ASes, an attack 
must either compromise multiple organizations or involve 
collusion between these organizations. We note that if a 
particular set of ASes were suspected of collusion, our 
client logic could easily be modified to include no more 
than one member of the set in each connection. Our 
adversary is assumed to have local knowledge of traffic, 
but global knowledge of the network topology and routing 
data. 

More concretely, the possible attackers we aim to protect 
against include: a local eavesdropper, the source ISP, the 
destination ISP, any single AS in between, any node facil- 
itating our protocol operations, and the destination itself. 
Thus, we aim for significantly greater protection than LAP 
or a centralized proxy server like Anonymizer.com. 

Given that we only protect against a single observation 
point, we offer no protection against attacks that require 
multiple observation points, even though such attacks may 
be practical for state-level adversaries [32] or Internet 
exchange points [33]. In common with LAP, but not Tor, 
we do not try to prevent trivial linkability based on packet 
contents and sizes. This means that linking attacks with 
multiple observation points need lower computational and 
storage resources and succeed with fewer observations than 
against Tor. Additionally, if both the source and destination 
are customers of the same ISP, it is simple for the ISP 
to correlate traffic. Again, Tor provides basic protection 
that makes this attack slightly harder, while both LAP and 
Dovetail provide no protection. 

Finally, given recent revelations about the NSA and 
GCHQ, some will argue that an anonymity system should 
protect against such adversaries, who may view traffic over 
undersea cables [34], [35] and perhaps could target Internet 
Exchanges (IXs) [33]. We believe that Dovetail is flexible 
enough to accommodate such consideration into the routing 
mechanisms, but we leave the design and evaluation of such 
an extension to future work. 

3. Background 

In this section, we cover two research areas of direct rel- 
evance to our problem: source-controlled routing protocols 
and low -latency anonymity systems. Within each area, we 
describe a proposal that our design builds upon. 

3.1. Source-Controlled Routing 

One theme spanning a number of next-generation Inter- 
net routing proposals is that of source-controlled routing, in 
which the originator of a data packet has some control over 
the route it takes, usually using routing control information 
carried in the data packet. In some protocols, the source 



has influence over the route but not complete control [14], 
[16]; in others, the source explicitly declares the route that 
should be taken [12], [15]. As we explain in Section 4.1, 
this ability to express a route at the source has benefits 
for anonymity in addition to the robustness and flexibility 
considerations that initially motivated the research. IPv4 
includes source control options [36], but these can be used 
to violate firewall rules and routing policies and are nor- 
mally disabled. We assert that these security concerns stem 
from inadequate consideration of security in the design of 
IPv4 [37] and need not apply to all implementations of 
source-controlled routing. 

3.1.1. Pathlet Routing. Pathlet routing [12] is one example 
of a source-controlled routing system. Each entity within a 
network defines a number of virtual nodes (or vnodes) and 
advertises path segments (or pathlets) that pass between 
these vnodes. Vnodes are a virtual construct, so a single 
physical router may process packets for multiple vnodes, or 
a single vnode may be distributed across multiple physical 
routers. Each vnode is defined by a forwarding table con- 
taining the set of allowed outgoing pathlets. All packets ar- 
riving from a particular communication peer are processed 
by one vnode whose forwarding table defines the set of 
allowed routes for that peer. The pathlet protocol provides 
an expressive system that is able to represent many different 
types of routing policy, although more complex routing 
policies require a greater number of vnodes. Godfrey et 
al. demonstrate that local transit policies (i.e. policies 
only dependent on ingress and egress points for their own 
network) may be represented efficiently and independently 
of the total network size [12]. This is in contrast to the 
BGP exterior gateway protocol [38] commonly used today, 
where the forwarding information base must scale linearly 
with the total number of advertised IP prefixes. 

To send a packet, the source assembles a list of adjacent 
pathlets defining the intended route and includes this list in 
the packet header. Each pathlet is represented by a variable 
length Forwarding ID (FID), an index into the forwarding 
table of the vnode that defined the pathlet. When a vnode 
receives a packet, it removes the first FID and uses this 
as an index into its forwarding table to determine which 
link the packet should be sent over. Only legal routes 
are defined in the forwarding tables. Therefore, unlike 
in BGP, it is impossible to violate the routing policy by 
invoking unannounced routes, since no such routes exist. 
When a vnode learns of adjacent single-hop pathlets, it 
may choose to aggregate these together into a composite 
pathlet. The translation from this composite into single- 
hop pathlets is stored in the forwarding table, so when a 
packet requests the composite path, the forwarding table 
is used to restore the component pathlets. Pathlet routing 
moves the responsibility for network route creation from 



the network infrastructure to the end hosts originating 
traffic. This means the large routing information base 
embodying network topology need only be consulted each 
time a new route is constructed, and not each time a 
packet is forwarded. It also provides flexibility for an end 
host to control how its packets will traverse the network. 
A source may rapidly select alternative routes to achieve 
better performance or compensate for network failures, 
instead of waiting minutes for an exterior gateway protocol 
to converge upon a new route. 

3.2. Low-Latency Anonymity Systems 

Previous works have introduced a variety of low-latency 
anonymity systems whose response time is sufficient for 
general-purpose interactive use, including Web browsing. 
A wide range of practical systems have been proposed 
and some of these have been fielded [3], [4], [39], [40]. 
Current low-latency anonymity systems may be categorized 
as either centralized or distributed. Centralized systems 
pass all traffic though an anonymizing proxy, which must 
be trusted. Distributed systems overlay an additional net- 
work on top of the current layer 3 protocol and therefore 
require multiple IP transmissions to deliver each packet 
from source to destination. These multiple transmissions, 
together with processing inside the intermediate hosts, 
contribute to latencies that are substantially higher than 
Internet usage without anonymization [41]. 

3.2.1. Lightweight Anonymity and Privacy. In 

Lightweight Anonymity and Privacy (LAP) [18], Hsiao et 
al. propose the anonymity scheme that inspires our work. 
Their protocol relies upon packet-carried forwarding state, 
where the information required to deliver a packet is 
stored within the packet itself. To establish a connection, 
the source constructs a packet containing a sequence 
of autonomous domains (ADs) describing the route. As 
each AD receives the packet, it encrypts its own routing 
instruction using a private symmetric key and forwards 
the packet to the next AD. Once a connection has been 
constructed in this manner, data may be exchanged 
between the endpoints using the resulting encrypted 
header. Each path construction request contains a nonce 
that influences the encryption process, allowing a source 
to construct multiple unlinkable connections over the same 
route by using different nonces. Header padding may be 
included to partially obfuscate the path length. During 
construction, each AD on the path learns the identity of 
all ADs that follow it but not the identity of the ADs 
before it. Some information on predecessor identity may 
be inferred based on knowledge of the preceding AD, 
network topology, routing policy, observed header length, 
and observed response time, but these are not quantified. 



LAP assumes the user's own ISP is trustworthy, and it 
provides no protection of source-destination unlinkability 
against a local eavesdropper or an observer at the source 
ISP. Given previous well -publicized ISP indiscretions [30], 
[31] and the possibility of a hacker infiltrating this single 
point of failure, it seems unlikely that privacy-conscious 
users will share this assumption. 

Other than LAP, ANDaNA is the only other next- 
generation Internet anonymity protocol that we know 
of [42]. It is only designed for named-data networks and it 
is built using onion routing, both of which are very different 
from Dovetail. 

4. Design 

In this section, we first provide context for our design 
point and then describe the protocol from four different 
perspectives in increasing detail. 

4.1. Layer 3 Anonymity Design Space 

To provide a broadly applicable anonymity system, we 
assert that any layer 3 solution should provide two features: 
Deviation from shortest path. An eavesdropper can 
measure information on the length of the network path 
before and after her 1 vantage point. If a routing protocol 
always selects the shortest possible route, then when the 
shortest route between participants is significantly shorter 
or longer than the Internet average, the protocol will reveal 
this abnormal distance and limit their anonymity. 
Partitioned routing information. When the routing in- 
formation is stored as a single field, such as an IP address, 
any entity with access to the field may calculate the desti- 
nation identity. When routing information is divided across 
multiple fields, then an entity must access multiple fields 
to learn the destination identity. Fields may be protected 
independently to prevent this access. 

Source-controlled routing is useful since it accommo- 
dates both of these features: when the source of a message 
can dictate a path, she is free to pick one that is not 
the shortest, and she may express the path as a separate 
instruction for each entity along the route. Dovetail uses 
construction requests that are traceable in the forwards 
direction as presented by Hsiao et al. [18]. Our detailed 
design builds upon the pathlet routing protocol presented 
by Godfrey et al. [12]. Pathlet routing works well for our 
system, but we are not reliant on any unique feature of 
this protocol. The principles we describe could be applied 
to any protocol that provides complete control over the 
selected route and a wide range of allowable routes. 

1. Throughout this paper, we use the genders of the standard security 
actors: The source, Alice, is female, the destination, Bob, is male, and the 
attacker, Eve, is female. 



4.2. Network Model 

We propose a clear distinction in routing at the AS 
boundary; each AS should expose the minimum number 
of vnodes and pathlets necessary to satisfy its routing poli- 
cies. This distinction provides two practical benefits: First, 
minimizing the number of externally visible vnodes reduces 
the size of the routing information base that must be held 
in end hosts. Second, distinguishing between internal and 
external connectivity allows an AS to retain a flexible and 
dynamic internal routing policy. As with BGP, adjacent 
ASes share routing information to establish the network 
topology. This communication should be secured against 
MITM attacks that could selectively filter the topology. We 
assume that hosts know the numeric identity of the vnodes 
they wish to contact. An equivalent to DNS would be 
required to translate human-readable identities into vnode 
identities. The translation service itself could be accessible 
using Dovetail to protect privacy, but is outside the scope 
of our current work. 
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Figure 1 . AS vnode and pathlet structure by routing 
policy: a) Strict valley-free, b) Loose valley-free 

The most common form of routing policy used in the 
Internet today is valley-free routing [43], which reflects 
the contractual relationships between ASes. A customer 
AS is one who pays a provider AS to forward its traffic, 
while two ASes with a peer relationship will each forward 
each other's traffic without payment. In valley-free routing, 
each AS will only forward traffic when there is a financial 
incentive to do so, i.e. when the traffic originates from 
or is destined for a paying customer. As illustrated in 
Figure la, two vnodes are required per AS to enforce 
this strict definition of a valley-free routing policy: one to 
receive traffic from customer ASes and one to receive traffic 
from peer and provider ASes. Although valley-free routing 
is common, BGP allows for arbitrarily complex routing 
policies and valley-free routing is not ubiquitous [44]. In 



particular, there are a growing number of Internet exchange 
points (IXPs), which offer ASes the ability to peer with 
each other and thereby save money [45]. Most transit and 
access provider ASes will peer with any non-customer 
AS [46]. This suggests that peering is compatible with 
ASes' incentives and is likely to continue to be common. 

We thus consider a slightly relaxed routing policy, which 
we refer to as loose valley-free. In this scheme, an AS 
will allow traffic to pass between its peers. The AS would 
not receive payment from a customer for performing this 
service, but also is not required to make a payment and 
could avoid payments at other times if peers provide a 
reciprocal service. As shown in Figure lb, three vnodes 
are required per AS to enforce a loose valley-free routing 
policy: one to receive traffic from customer ASes, one to 
receive traffic from provider ASes, and the third to send 
and receive peer traffic. 

For good anonymity properties as described in Sec. 4.4, 
Dovetail relies on a modest fraction of ASes to use the 
loose valley-free policy or other policies that are less strict 
than valley-free routing. If all ASes use strict valley-free 
routing, Dovetail still provides anonymity, but with smaller 
anonymity sets. 

4.3. Path Construction 

A Dovetail path comprises multiple path segments. As 
we explain in Section 5.2, an AS that is present on a path 
segment may learn the identity of subsequent ASes and its 
direct predecessor, but not earlier ASes. Figure 2 illustrates 
the Dovetail path creation process. 
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Figure 2. Construction of a Dovetail connection 

The path cannot be constructed directly from the source 
to the destination, since the source's ISP would be able 
to link source and destination. Instead, we make use of a 
randomly selected, untrusted third-party vnode called the 



matchmaker. This matchmaker may either be an end host 
or functionality exposed by a service provider. Providing 
matchmaker services should cost little relative to enabling 
our protocol in routers. The identities of vnodes willing to 
act as matchmakers could be distributed as a part of routing 
information maintenance. 

The source encrypts the identity of the final destination 
using a public key for the matchmaker and builds a head 
path segment to the matchmaker, who then extends the path 
to the destination with a tail path segment. Here, the source 
ISP no longer learns the identity of the destination, only 
of the matchmaker. The matchmaker learns the identity of 
the destination, but cannot identify the source through the 
intervening ASes. The source may learn the matchmaker's 
public key without compromising anonymity by requesting 
a signed certificate over the same path used to establish 
the connection. To improve performance and minimize the 
trust we must place in the matchmaker, we prefer that 
the matchmaker not be involved in the exchange of data. 
Therefore, we require that the head and tail segments cross 
at some vnode, referred to as the dovetail 1 . The source 
encrypts the identity of the dovetail and provides it to the 
matchmaker for inclusion on the tail segment. The dovetail 
detects the crossing condition and joins the two segments, 
removing the loop in the path along with the matchmaker. 

The tail path segment would ideally be selected by the 
source, but the source does not have complete knowledge 
of distant Internet topology. The matchmaker has sufficient 
knowledge to construct a path to the destination, but the 
user's anonymity can be degraded if an AS appears on both 
the head and tail segments, and therefore we prefer that the 
tail segment avoids ASes already used on the head segment. 
Providing a list of head ASes to the matchmaker would 
reveal substantial information on the source identity, so 
instead we ask the matchmaker to return a set of potential 
tail routes that the source selects from. The source then 
sends its choice to the matchmaker to complete the route. 
For brevity, we do not include this tail selection mechanism 
in our description of the packet design, but we do examine 
its effect on anonymity in Section 6.3. 

4.4. Segment Route Selection 

A source-controlled routing system may attempt to ob- 
fuscate path length, but an attacker located on the path 
will be able to infer some information about her distance 
to the source and destination through round trip timing, 
packet length and structure analysis, and active probing. We 
prefer a system that is robust even when an attacker learns 
path length to one that relies on keeping it hidden. For the 

2. We use the term to reflect a dovetail joint in carpentry, where two 
elements are joined securely and compactly 



remainder of the discussion, we assume the attacker has 
perfect knowledge of the number of ASes preceding and 
following her own, but limit the value of this knowledge 
through a non-deterministic path selection process. 

Our mechanism for routing each path segment is based 
upon the principle of path diversity, where a large number 
of possible paths may be taken from any given source 
to any given destination. We note that this is beneficial 
for the robustness of the system in addition to its an- 
onymity. To achieve path diversity, each host must have 
a comprehensive, but not necessarily complete, map of 
the network. We extend the pathlet routing protocol by 
exporting extra pathlets in addition to the shortest path tree 
(SPT). The optimal set of additional pathlets depends on 
network size and topology, but our experiments show that 
for the current Internet, is it appropriate to export 50% 
of the SPT size, selecting pathlets closest to the sender. 
An important consequence is that routing knowledge varies 
across the network, and so any assessment of available path 
options can only be made in the context of the vnode (in 
our case, the source or the matchmaker) selecting the path. 
Maintenance of routing information in response to network 
changes could be performed using path vector distribution 
methods similar to BGP [38], but this is not relevant to the 
anonymity properties of the system and so is not discussed 
further. 

When a host constructs a path segment, it will normally 
have a wide range of options available with different 
costs, where we define cost as the number of times the 
route changes AS. Other cost metrics such as latency or 
bandwidth could also be integrated into the protocol. The 
distribution of options across cost reflects the network 
topology between the source and destination. Selecting a 
random path uniformly from among the complete set of op- 
tions would reveal information about this distribution, such 
as picking the most common path cost most frequently, and 
thus leak information about the topology. Instead, we use 
a cost window approach: we select a path by first selecting 
a path cost and then randomly selecting one of the paths 
at this cost. We explain this scheme further in Section 6.2. 

4.5. Data Packet Structure 

Dovetail extends the basic packet format used in pathlet 
routing, providing a set of different packet types and pro- 
cessing algorithms for each type. These algorithms provide 
the following security properties: 

1) An AS does not learn the identity of ASes before its 
immediate predecessor. 

2) AS routing information is protected by a key known 
only to the AS. 

3) Different connections travelling over the same route 
do not produce the same ciphertext. 



4) The final ciphertext for each AS depends on the entire 
path. 

5) An AS may only create a removable loop in the path 
when given access to privileged information. This 
information is only given to the matchmaker. 

Each Dovetail packet contains a type identifier, followed 
by one or more variable-length header segments, followed 
by the payload. Table 1 summarizes the notation we use in 
this section. Figure 3 presents the packet types in terms of 
their header segments, while Figure 4 presents the structure 
of each header segment. 

Plain Packet 

| Type | Unencrypted Segment | Payload ^ 
Path Construction Packet 

|Type| Unencrypted Segment | Transit Segment | Join Segment |Payload^ 

Construction Return Packet, Encrypted Data Packet, 

and Encrypted Response Packet 

| Type | Unencrypted Segment | Transit Segment |Payload^ 

Figure 3. Structure for each packet type 



Table 1 . Packet structure notation 



Term 


Definition 


U 


Unencrypted packet segment. Stores 




partial path during construction and 




while traversing an AS. 


T 


Transit packet segment. Stores complete 




bi-directional path in encrypted form. 


J 


Join packet segment. Facilitates routing 




loop detection during path construction. 


T A 


Transit entry for AS A. 


Ja 


Join entry for AS A. 


id A 


Identifier for AS A. 


1%A 


Transmission link used to enter AS A. 


k A 


Symmetric encryption key for AS A. 


Pa 


Sequence of pathlets traversing AS A 




in the forward direction. 


Qa 


Sequence of pathlets traversing AS A 




in the reverse direction. 


m A 


Maximum number of bits required to 




represent any path traversing AS A. 


Nl, N2 


Nonce values. Initialized randomly then 




modified during path construction. 


off(a;) 


Offset of field x from segment start. 




Transit entries preceding Ta- 


F(T A ) 


Transit entries following Ta- 


H{x) 


Cryptographically secure hash of x. 


E(k, v, x) 


Encryption of x with key k and IV v. 


D(k, v, x) 


Decryption of x with key k and IV v. 



if U is empty then 

perform matchmaker processing or send 
construction return packet as described in text 
else if packet received from different AS then 
JoinFound <— False 
for J x in J do 

aK(Tx)\\lix<-D(k A ,N£\\Nl,Jx) 
if lix and off(Tx) in legal range then 
x\\p\\q ^ D(k A ,P{T x )\\Nl,T x ) 
if x — id a then 

JoinFound <— True 
adjust paths in T x for new AS exit 
truncate T after T x 
truncate J after Jx 
break 
end if 
end if 
end for 

if not JoinFound then 

read portion of U passing through A, p A 
lookup path in opposite direction, q A 
pad p A and q A to a length of m A 

T A ^ E(k A ,P(T A )\\Nl,id A \\p A \\q A ) 
append T A to T 

J A ^ E(k A ,N2\\Nl, off {T A )\\li A ) 
append J A to J 
end if 

N2 <- H{N2) 

perform plain packet processing 

else 

perform plain packet processing 
end if 



Unencrypted Header Segment (U) 
| Size | FID | fId] ••• | FID | 

Transit Header Segment (T) 

I Size I Offset I N1 I | I ••• I I 

— — — x — _ 

On construction E(k B ,P(T B )||N1 , |id b |fid| »»» |fid|fid| »»»[f1d1) 

After return E(k B ,F(T B )||N1 , |id b |fid| ««« |fid|fid| «««|fid|) 

Join Header Segment (J) 1 * " v 

I Size I N2 I ~ k I ~ B I ••• I In I Pb qB 

~ r ~^ 

E(k R ,N2||N1, [ off(T B ) | li B | ) 

Figure 4. Structure for each header segment type 

Dovetail does not perform cryptographic operations on 
the packet payload, only on routing information in the 
packet header. Restricting our operations to a small number 
of bytes in the header allows for fast operation within the 
routing infrastructure. Privacy-preserving transport layer 
protocols, such as IPsec, may be used to encrypt or authen- 
ticate data when needed. To create a Dovetail connection, 
the source issues a path construction packet, leading the 
destination to respond with a construction return packet. 
Once the response is received, the source and destination 
may continue to communicate over the connection using a 
sequence of encrypted data and encrypted response packets, 
each containing the transit segment created during path con- 
struction. Vnodes need not store any per-connection state 
in order to exchange data. We now discuss each of these 
packet types in turn, in order of increasing complexity. 

4.5.1. Plain Packet. A plain packet follows the underlying 
pathlet routing protocol without the anonymity features 
introduced by Dovetail. A plain packet only contains an 
unencrypted header segment defining a string of forwarding 
table indexes (i.e., FIDs). Upon receipt of a Plain Packet, 
each vnode follows the standard pathlet packet processing 
algorithm (Figure 5). 

4.5.2. Path Construction Packet. A path construction 
packet is used to establish an encrypted path through a 
matchmaker and dovetail. The source initially constructs 



if U is empty then 

deliver packet to application layer 

else 

read and remove first FID from U, f 
prepend FIDs at forwarding table index / to U 
send packet over link at forwarding table index / 
end if 

Figure 5. Plain Packet processing 



Figure 6. Path Construction processing within AS A 



the packet containing a path to the matchmaker in the 
unencrypted segment but no entries in the transit and join 
segments. Upon receipt of a path construction packet, the 
receiving vnode follows the algorithm shown in Figure 6. 
Each time the packet enters a new AS, tests for a loop join 
condition by searching for previous join entries encrypted 
with a matching key and nonces. If the join condition is 
met, the AS deletes all transit and join entries after its 
first inclusion on the path. If the join condition is not met, 
the AS adds a new encrypted transit entry to describe the 
shortest path through the AS in both directions, and a new 
fixed-length join entry that enables location of the variable- 
length transit entry. Both the join and transit entries are 
encrypted with a symmetric key known only to the AS . This 
key is used for all connections through the AS but should 
be changed periodically to limit the impact of compromise. 
A secure hash function is applied to N2 at each new AS, 



similar to a Lamport hash chain [47], preventing an attacker 
from predicting previous nonce values and therefore from 
creating artificial joins. 

When a path construction packet is received with an 
empty unencrypted segment, the response depends on the 
packet payload. If the payload is empty, which is the 
case at the destination, the receiver replies with a con- 
struction return packet. This packet is created from the 
path construction packet by discarding the join segment, 
adding a payload containing Nl , and then updating Nl to 
H(N1\\Ta---Tn). If the payload contains a continuation 
request, which happens at the matchmaker, the matchmaker 
decrypts the continuation request with its public key to 
learn the identities of the destination and dovetail, the cost 
to dovetail, and a prior hash of N2. The matchmaker uses 
this information to compute possible tail paths and return 
these to the source for selection as discussed in Section 4.4. 
The matchmaker then removes the packet payload, creates 
a new unencrypted header segment containing the selected 
tail path, and sets N2 to the prior hash received from the 
source. This last step ensures that the dovetail receives the 
same value of N2 on both the head and tail segments. 

4.5.3. Construction Return Packet. A construction return 
packet is used to return the selected path to the source. 
The transit segment is re-encrypted during return using the 
algorithm in Figure 7, such that the final ciphertext depends 
upon the entire selected path. 



if packet received from different AS then 
use offset to locate Ta within T 
x\\p A \\q A <r- D(k Al P{TA)\\payload,T A ) 
if x ^ id a then 

return > Invalid packet 

end if 

replace current entries in U with q A 

T A ^E{k A ,F{T A )\\Nl,idA\\p A \\q A 
offset <— offset — (le,n(id A ) + 2itia) 
end if 

perform plain packet processing 
Figure 7. Construction Return processing within AS A 



4.5.4. Encrypted Data Packet. An encrypted data packet 
is used by the source to send data over an existing Dove- 
tail connection. The packet is constructed with an empty 
unencrypted segment and the previously received transit 
segment, with an offset set to zero. Each time an encrypted 
data packet enters a new AS, the current transit entry is 
decrypted and the unencrypted segment is replaced with the 
forward path from this entry, as described by the algorithm 
in Figure 8. 



if packet received from different AS then 
use offset to locate T A within T 
x\\p A \\q A ^D(k A ,F(T A )\\Nl,T A ) 
if x 7^ id A then 

return > Invalid packet 

end if 

replace current entries in U with p A 

offset <— offset + (len(id ^) + 2mA) 
end if 

perform plain packet processing 

Figure 8. Encrypted Data processing within AS A 

4.5.5. Encrypted Response Packet. An encrypted re- 
sponse packet is used by the destination to send data over 
an existing Dovetail connection. The packet is constructed 
with an empty unencrypted segment and the previously 
received transit segment. The offset is set to the end of the 
last transit entry. Each time an encrypted response packet 
enters a new AS, the current transit entry is decrypted and 
the unencrypted segment is replaced with the reverse path 
from this entry, as described by the algorithm in Figure 9. 



if packet received from different AS then 
use offset to locate T A within T 
x\\p A \\q A <-D(k A ,F(T A )\\Nl,T A ) 
if x 7^ id A then 

return > Invalid packet 

end if 

replace current entries in U with q A 

offset ■<— offset — (len(id ^) + 2mA) 
end if 

perform plain packet processing 

Figure 9. Encrypted Response processing within AS 

A 

5. Security Analysis 

In this section, we assess the security of the Dovetail 
protocol. We consider a range of anonymity attacks that 
might be applied against the protocol and then analyze the 
information available to a passive attacker at each point 
in the network. We end with brief discussions of timing 
attacks and attacks on availability and integrity. 

5.1. Attacks on Anonymity 

As Dovetail is lightweight, it does not protect against 
attacks that succeed against an overlay system like Tor. 
In particular, an entity who can observe traffic at multiple 
points in the connection can link both of those points, which 



can link a source to her destinations. In Dovetail, this is 
trivial, as the packet contents are not encrypted differently 
at different points in the network. In Tor, however, timing 
analysis can enable this linking with high accuracy [28], 
[29]. Other attacks that rely on multiple points of ob- 
servation, such as selective denial of service [48] and 
predecessor [49] attacks will be just as effective in Dovetail. 
Additionally, Dovetail is vulnerable to the same types of 
side-channel attacks that impact Tor [50]-[54]. 

Beyond this, however, we need to examine additional 
attacks that could leverage the unique aspects of the 
Dovetail protocol. The primary information available to a 
passive attacker in the network is the cost to the source and 
destination and the preceding and following ASes in the 
path, and we examine the affect of these on anonymity in 
Sections 5.2 and 5.3. Timing attacks are considered briefly 
in Section 5.4. Other attacks include: 

Observe or correlate packet content. Dovetail is a 
layer 3 protocol and does not provide any protections for 
the data it carries. In cases where packet content would 
reveal identity, or where confidentiality is important, a 
higher layer protocol such as IKEv2 should be used to 
provide encryption [20]. 

Correlate connections from a source. Each connection 
includes a source-defined nonce, Nl. When the source 
changes this nonce, a different ciphertext will be pro- 
duced, preventing an observer from associating multiple 
connections over the same path from their header content. 
When connections between a source-destination pair are 
distinctive, and may hence be correlated by some other 
property, the source could reuse the same matchmaker and 
path to prevent intersection and predecessor attacks. 

Replay packets. A replayed packet will take the same path 
as its original transmission and therefore not provide an 
attacker with new information. An adversary might try to 
probe for the source by prepending an unencrypted path to 
a recorded packet, but each AS empties the unencrypted 
segment on receipt to prevent this attack. 

Probe for a later AS. To determine the destination of an 
observed connection, an attacker may try to construct many 
new connections through the same dovetail and search for 
matches in the header ciphertext. Dovetail protects against 
this attack by including a hash of the entire path in the IV 
for encrypted transit segments. Any change in the selected 
path will therefore perturb the ciphertext for all segments. 

Probe for an earlier AS. The joining of a Dovetail path 
provides confirmation that the joining AS appeared on the 
path twice, and an attacker may wish use this feature 
to probe for suspected predecessors. During connection 
construction, an attacker may attempt to extend the path to 



a suspect and then back to herself, where she could observe 
whether a join occurred. Our use of hash chaining on N2 
prevents this attack, since the attacker cannot replicate the 
nonce initially presented to the suspect. The matchmaker 
is provided with an earlier nonce to create a legal join and 
may perform some probing, but this is heavily constrained 
by the dovetail-matchmaker cost limit. 

Matchmaker intersection. The matchmaker provides the 
source with a set of possible tail segments from which the 
source picks one. Since the source will not select an AS 
already on the head segment, including it's own ISP, the 
matchmaker could try to offer tail segments that help it 
isolate possible source ASes. In particular, if there is a 
source AS of interest A, then the matchmaker could pick 
tail segments that include likely ASes between itself and A. 
If the source avoids these tail segments, it adds to the like- 
lihood that the source is in A. However, fully unmasking 
the source AS with this type of intersection attack would 
require a large number of requests. As matchmakers are 
selected randomly from a large set, an attacker located 
at any particular matchmaker is unlikely to receive many 
connection requests from the same source. 

Modify the requested path. An AS along the path could 
modify the unencrypted header segment to alter the route 
taken for the remainder of the path segment, but gains 
little from doing so. All vnodes along a path segment can 
identify the destination, and earlier vnodes have a better 
knowledge of the source. Thus, an attacker that places 
herself later in the same path segment does not learn any 
additional information regarding source or destination. 

Modify the tail path. The matchmaker could use a differ- 
ent tail option than that selected by the source. However, 
the matchmaker does not learn whether unselected paths 
were acceptable and cannot identify the source and so 
cannot predict whether a particular path will be bad for 
that source. A matchmaker could speculatively route all 
connections through a particular ISP to allow identification 
of any sources within that ISP. This attack may be effective 
given a sufficient number of matchmakers, but widespread 
collusion falls outside our attack model. 

5.2. Single Segment Anonymity Analysis 

We now examine the source and destination anonymity 
at each point along a single path segment as a step towards 
analyzing the complete path. We present two approaches for 
assessing single segment topological anonymity: a simple 
and efficient approach based on set size and a more accurate 
approach based on entropy. 

5.2.1. Anonymity Set Size. Consider an attacking 
AS, ASi, located at cost i in a path segment 



AS 0 , AS 1: ■ ■ ■ AS^ASi ■ ■ ■ AS n . This AS can identify 
its predecessor, ASi-i, but cannot directly identify earlier 
ASes, since their routing instructions have been encrypted. 
The remaining portion of the segment is not encrypted, and 
therefore all following ASes and the destination are known. 
ASi can accurately measure the path cost from the source 
to itself by using the length of the join segment in the 
packet header, and thus it can deduce the total path cost n. 
We use Sy S:c to represent the set of possible sources that 
have a shortest path to AS X of cost y. 

Shortest Path. When the shortest path is used, an ad- 
versary knows that the observed cost for ASi-\ and for 
all subsequent ASes must be the shortest cost from the 
true source. The set of possible sources is therefore the 
intersection of the possible source sets for each observed 
AS: 

n 

sources(shortest path, ASi) = P) S^ S3 

i=(i-i) 

If n is close to the minimum or maximum cost present in 
the Internet, then few sources will fall into this intersection, 
leading to an uncomfortably small anonymity set. 

Cost Window Selection. As an alternative, we propose a 
cost window selection algorithm to select uniformly at 
random a length between some global minimum p (or the 
shortest path cost if it is greater) and some global maximum 
q. Ideally for anonymity, q should be greater than the 
maximum shortest path cost in the network. Given cost 
window selection, an attacker cannot make any statement 
about the possible message senders, except that each must 
be able to form a path of the observed length to the 
observed predecessor ASi-\. Our experiments show that 
as long as at least a small fraction of ASes use a loose 
valley-free routing policy, long path choices are plentiful. 
This means that, in most cases, a source can produce a path 
at any given cost greater than the minimum. We find this 
to be true 96% of the time when 10% of ASes are loose 
valley-free. Making an approximation that this is always 
true, then the set of possible sources is simply the union 
of the sources at every distance less than or equal to the 
observed value: 

i-l 

sources (cost window, ASi) = [_) S^^ 1 

3=0 

AS ■ 

By examining the relationship to A^^ 1 , it is clear that: 

sources ( short est path, ASi) C sources(cosi window, ASi) 

Hence, cost window selection provides an equal-sized or 
larger source anonymity set in all cases. In addition, n is 
capped to a minimum of q, and therefore very short path 
costs with unavoidably small anonymity sets will never 



be generated. Using cost window selection, even sources 
with a low shortest path cost will occasionally select very 
long paths, and therefore the average latency will be higher 
than in shortest path selection. A non-uniform probability 
distribution can be used to control how frequently larger 
costs are selected and limit this performance penalty. 



Table 2. Effective anonymity set size notation 



Term 


Definition 


S 


The set of all possible sending vnodes 


s 


The true source vnode 


d 


The destination vnode 


P 


The path selected to send data from s to d 


a 


An attacking vnode located on p and wishing to 
identify s within S 


a' 


The vnode on p immediately preceding a 


tpO) 
\ p (x) 


The portion of path p after vnode x 


The cost along p from the source to x 


OBS a 


The set of observations available to a 


R(x,y) 


The set of paths from x to y 


R(x,y,X) 


The set of paths from x to y of cost A 



5.2.2. Effective Anonymity Set Size. The preceding anal- 
ysis is simple and efficient for an attacker to compute using 
only shortest path distances, but with complete knowl- 
edge of routing tables, she can achieve a better result. 
The probability that a given potential source selected the 
observed path depends on the available path options for 
that source, and thus it is not uniform across the set of 
potential senders. These differences in probability allow 
calculation of an effective anonymity set size based on 
the entropy of the distribution. The notation we use in 
this section is summarized in Table 2. For each possible 
source t, the probability of selecting the observed path cost 
Pcost match (t) depends only on the path selection algorithm 
and the presence or absence of paths in R(t, d) at each cost, 
not the number or definition of these paths nor the location 
of the attacker. Section 6.2 presents a series of different 
options for path length selection, along with their cost 
selection probabilities. Once a cost has been selected, a path 
of this cost is chosen randomly from the available set. The 
probability that t chose a path matching the observations 
is thus given by the fraction of paths that both place the 
observed predecessor at the observed cost and that match 
the observable portion of the path: 

^predecessor match (t) 

\a, a e R(t, d, X p (d)) A \ a (a') = X p (a') A t CT (a') = t p (a')\ 
\R(t,d,X p (d))\ 

We assume that the a priori probability P apr {t = s) of each 
vnode t being the source of a given message is uniform. 
The probability that t is the true source given an attacker's 



observation may then be calculated by Bayes theorem: 

^(OBSqI^ — 8) Pcost match\p) ^ ^predecessor matchi^) 

P(OBS a |t = s) x P apr (t = s) 



P{t = s|OBS a ) = 



E ^(OBS a |i = s) X P apr (i = s) 



Finally, we may use this set of potential source probabilities 
to compute an effective source anonymity set size based 
on the entropy of the distribution by using the technique 
proposed by Serjantov and Danezis [55]: 

S=-J2 p (t = s|OBS a ) log 2 (P(t = s|OBS a )) 
tes 

5.3. Complete Path Anonymity Analysis 



A passive adversary who observes the construction of a 
Dovetail path segment has full knowledge of the remainder 
of the segment and partial knowledge of the segment 
source. She may learn further information from observing 
the return path. We now discuss the complete set of infor- 
mation available regarding source and destination identity 
at each location on a Dovetail path. When a measurable 
cost is available, a set of possible identities can be built 
using the techniques defined above in Section 5.2. 

Source Identity. The source identity is known to the 
source ISP. An attacker at each subsequent AS towards 
the matchmaker (which includes the dovetail node) can use 
its knowledge of the preceding AS identity, cost from the 
source, and all subsequent pathlets up to the matchmaker 
to limit the possible source identities. At the matchmaker 
itself, for paths of more than three or four hops, the 
number of possible sources should be quite large. After the 
matchmaker, the amount of information about the source 
will be even less. 

Destination Identity. The destination identity is known to 
every ISP from the matchmaker to the destination ISP due 
to the construction request. Any AS on the head segment 
out to the matchmaker, but that does not appear on the 
data path, has no knowledge of the destination. Between 
the source and the dovetail, an attacker can measure the 
cost from the destination to her own AS using the data 
return path. If the attacker is able to guess which AS on 
the head segment serves as the dovetail, she can infer cost 
from the destination to the dovetail. 

As intended, locations where the source is easily identified 
have little information about the destination and vice versa. 
The dovetail is the closest AS to the source that leams 
destination identity; it is typically the strongest location 
for a passive attacker. To avoid elevating the capability of 
an attacker located at the dovetail AS, we require that this 
AS only appear on the head segment once. Any other AS 



that appears twice in a given segment gains no additional 
information from its second inclusion. 

Each segment of the dovetail path serves a purpose in 
maintaining a particular anonymity property; this should 
be considered when setting the segment length. The head 
segment must be long enough to conceal source identity 
from the dovetail, and the tail segment must be long enough 
to conceal destination identity from the source ISP. Finally, 
we note that uniform random selection of the matchmaker, 
uncorrelated with either the source or destination, is effec- 
tive in isolating the anonymity properties of our system. 
An AS on the head segment can identify the matchmaker, 
but this does not help to identify the destination; an AS on 
the tail segment may be able to identify the matchmaker, 
but this does not help to identify the source. 

5.4. Response Timing Attacks 

The path diversity used to select each segment should 
hinder an attacker's ability to identify participants from 
response timing data. Each potential source could have 
used one of many thousand possible routes to reach the 
destination, and each of these routes has its own latency 
distribution. The superposition of these distributions blurs 
the range of possible response times for a source signif- 
icantly when compared to shortest path routing and thus 
makes distinguishing between different sources harder. 

A performance-optimized version of Dovetail could con- 
sider geographical distance or latency in its selection of a 
matchmaker, which in turn would change the anonymity 
properties of the protocol. 

We consider this integration of performance and anony- 
mity concerns in response to network latency information 
to be a rich avenue for further study. 

5.5. Availability and Integrity Attacks 

Violate routing policy. As with pathlets, all forwarding 
tables entries are valid expressions of the routing policy, 
and hence it is not possible to construct a path that violates 
this policy. 

Construct arbitrarily long paths. Our packet design 
constrains the maximum length of both encrypted and 
unencrypted packet header segments and thus limits the 
longest path an adversary intending to waste resources can 
construct. 

Overload a matchmaker. A matchmaker could be over- 
loaded by sending a large number of continuation requests, 
but matchmakers are distributed throughout the network 
and the effect on clients is minor if the first matchmaker 
they contact is unavailable. 



Overload a routing vnode. Our forwarding operations 
are simple and intended to operate at the full data rate of 
a router. Connection construction requires more operations 
and a cache of recent nonces, but a maximum connection 
rate could be enforced to constrain this resource utilization. 

Pollute routing tables. Securing the integrity of routing 
tables updates is an important requirement, but we do not 
consider the routing maintenance protocol here hence it is 
outside the scope of our work. 

Modify packet contents. Dovetail is a layer 3 protocol 
and does not provide any protections for the data it is used 
to carry. In cases where integrity is important, a higher 
layer protocol should be used to provide authentication. 

Discard packet data. If the quality of service provided by 
a connection drops below some threshold, this would be 
observed as a failure, for which the recommended remedy 
is to reconnect over a different path. Paths are constructed 
by random selection from the available routes, and so 
this reconnection is likely to remove any intermediate AS 
discarding data. 

6. Evaluation 

Our proposal is evaluated primarily by simulation, using 
a model of the complete Internet at the AS level. In this 
section, we first introduce our simulation and input data, 
then discuss the anonymity and cost results for path seg- 
ments and for complete paths, and conclude by estimating 
a variety of resource requirements for our system. 

6.1. Simulation Scope 

Our simulation models a network of ASes, each con- 
taining up to three routing vnodes plus host vnodes to 
represent its end users and matchmaking capability. ASes 
are connected by pathlets that codify their contractual ar- 
rangement; customer, provider, or peer. All pathlets within 
an AS have a cost of zero and all pathlets between different 
ASes have a cost of one. We simulate the exchange of 
routing information at initialization, leading to a unique 
routing perspective for each AS that contains all routing 
vnodes but not all pathlets. Separately, we simulate packets 
at a bit level during a connection, allowing us to test header 
design to ensure that routers and the matchmaker could 
correctly run the protocol. 

Our Internet topology is derived from the CAIDA in- 
ferred AS relationship dataset [56]. The dataset contains 
sibling relations, which permitted infinitely long valley-free 
routes in some circumstances. To avoid optimistic bias, we 
replaced all sibling relationships with the more restrictive 
peer relationship. This reclassification causes 5.5% of the 



network to lose complete reachability, so we disallow traffic 
originating from or terminating at these ASes. We consider 
each AS without customer ASes to be a service provider 
for end users and add a host vnode to represent these 
users. Ideally, we would model the number of users, but 
accurate ISP customer size data are not available. Rather 
than risk skewing our conclusions, we restrict ourselves 
to measuring anonymity based on the number of possible 
source or destination ISPs, recognizing that some ISPs are 
far larger than others. 

We consider a mixture of ASes following the strict and 
loose valley-free routing policies defined in Section 4.2. 
Experimentation shows that when all ASes follow strict 
valley-free, the number of routing options is limited, but 
introducing even a small proportion of loose valley-free 
ASes leads to far greater diversity. 10% loose valley-free 
ASes gives a median of 91,000 options for each path, and 
we use this topology for the remainder of our evaluation. 
Studies show that strict valley-free routing is not universal 
today [44], but we acknowledge that our selection of 10% 
is arbitrary. 

6.2. Single Segment Performance 

To select a path segment, the source compiles a set of 
available routes using a modified depth first search. Our 
implementation limits this set to a maximum cost of 13, 
based on the longest distance present in the network, and 
also a maximum of 20,000 routes at each path cost to 
limit computation. We first select a cost from the set of 
available costs (i.e. costs with at least one route) and then 
select a random route of this cost. We present results for the 
following four path selection algorithms, expressed in terms 
of the probability P(X) that each will select a particular 
available cost A. In all cases, k is a constant specific to the 
algorithm: 

Shortest. Shortest possible path is selected in all cases: 
P(A) = 1 if A = X s hortest, or 0 otherwise. 

Uniform. Each cost is equally likely: P(X) oc 1. 

Exponential. Longer paths are less likely: P(X) oc k x . 

Exponential. Longer paths are less likely, but costs under 
four are prohibited: P(A) = 0 if A < 4, or P{\) oc k x 
otherwise. 

Figure 10a presents a distribution of the cost produced by 
these algorithms for random source-destination pairs, while 
Figure 10b presents the source anonymity set size distribu- 
tion as measured by an attacker at the destination. These 
results demonstrate that all the algorithms that produce a 
non-deterministic cost succeed in achieving a meaningful 
anonymity improvement over the shortest path algorithm, 
by three bits of entropy in the best case and over one 
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bit for the majority of cases. Algorithms that select long 
paths more frequently achieve better anonymity but result 
in a higher average cost. The change from Exponential to 
Exponential is particularly striking, showing a dramatic 
improvement in worst case anonymity from the exclusion 
of short paths, with only a modest cost increase. The Expo- 
nential algorithm results in an average cost approximately 
25% greater than shortest path routing, and yet it achieves 
an anonymity set containing over half the network in 98% 
of the tests. 

In Figure 10c, we compare the set size metric with 
the effective set size metric. This data is from a smaller 
network, approximately one quarter of the full Internet, and 
is presented on a linear scale. The data shows that our non- 



deterministic methods have a slightly smaller effective set 
size once their cost selection probability is fully incorpo- 
rated, but this reduction is small. The largest reduction in 
set size is around 0.35 bits. This shows that cost window 
selection is effective at limiting information gain by the 
attacker. 

6.3. Complete Path Performance 

We now evaluate the anonymity and cost properties of 
complete paths. Dovetail includes parameters that users 
can configure to trade performance against anonymity. Our 
objective here is to demonstrate the anonymity limit of this 
sliding scale, but many users will prefer a lower setting. The 



parameter settings we use are: 

Dovetail to Matchmaker Cost = Two. Provides strong 
limits on matchmaker capability without requiring that 
dovetail and matchmaker are adjacent. 

Source to Matchmaker Algorithm = Exponential. 

Effectively delivers Exponential at the dovetail. 

Dovetail to Destination Algorithm = Exponential*!. 

Shown in Section 6.2 to provide near maximum anonymity. 

In our experiment, we select source and destination 
hosts at random and construct a dovetail path between 
them. The matchmaker generates eight tail path options 
and the source selects one from this set. Where possible, 
the source selects an option that does not reuse a head AS, 
but in 23% of paths constructed all options required such 
reuse. 3 We measure the source and destination anonymity 
set size observable by an attacker at each location in the 
path. Random selection of a matchmaker decouples the 
source and destination anonymity sets, and therefore we 
can also consider the source-destination unlinkability, i.e. 
the number of potential source-destination pairs associated 
with an observed connection, to be the product of the source 
and destination anonymity set sizes. Figure 11 presents 
the distribution of these three properties at a series of key 
locations along the path, and Figure 12 presents the cost 
distribution, with the cost of shortest path routing included 
for comparison with IP and LAP. 

Each successive step adds ambiguity to the source iden- 
tity. At the dovetail AS, source anonymity is approximately 
equal to network size in 80% of cases. Destination identity 
is known at the dovetail and all subsequent locations, but 
locations prior to the dovetail are unable to calculate a 
meaningful destination identity. No location except the 
source is able to clearly link source and destination. The 
AS immediately preceding the dovetail is most likely to 
be duplicated in head and tail segments, being adjacent to 
an AS that is always present in both. As illustrated by the 
destination anonymity for "Before Dovetail", this occurred 
in 5% of our experiments. The dovetail may partially 
calculate source identity in around 20% of cases, but this is 
limited to around one thousand possible source ISPs, each 
containing many users. Figure 12 shows that a Dovetail 
path passes through approximately 2.5 times more ASes 
than the shortest path routing used in the current Internet. 
This is a modest penalty when compared to the prevailing 
option for anonymity today; an anonymous circuit in Tor 
typically passes through three relays for a total of four IP 
paths, including six more last-mile connections than a direct 
path, and incurs additional processing and queuing delays 
at each relay. 

3. We plan in future work to develop a heuristic to select dovetail 
vnodes with a lower probability of reuse. 
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6.4. Resource Utilization 

Rather than proposing a near-term solution, we aim to 
show that privacy is a feasible feature to include in future 
routing protocol designs. Nevertheless, we now briefly 
consider a variety of resource requirements to demonstrate 
that implementation would be feasible. 

Host memory utilization. Each Dovetail host must main- 
tain a model of the Internet to generate routes. In the 2012 
dataset we use there are 252,666 visible pathlets, of which 
an average of 22% are known, requiring 680kB. 

Router memory utilization. A Dovetail forwarding table 
scales with the number of local peers and not the total 
number of Internet prefixes as with BGP All forwarding 
information is carried by the packet itself, and so a router 
need not store any information per connection. 

Router latency. The only cryptographic operation re- 
quired to forward a data packet is a symmetric decryption 
of one word. This is the same task performed by LAP; 
Hsiao et al. measure an additional latency of under one 
microsecond in a software-based implementation of their 
system [18]. 

Transmission efficiency. A Dovetail packet must specify 
a complete path rather than only an endpoint, potentially 
leading to large headers and low efficiency. The average 
header length in our experiments is 92 bytes. Given an 
MTU of 1500 bytes, this represents a 3.5% reduction in 
payload compared to IPv6. LAP would require a 60 byte 
header. 

7. Conclusion 

In this paper we presented Dovetail, a next-generation 
Internet routing protocol, and have demonstrated that it 
provides a workable solution for anonymity at the network 



layer. The overhead is approximately 2.5 times that of 
shortest path routing when configured to provide near 
complete anonymity against our chosen attacker, and we in- 
clude mechanisms to exchange anonymity for performance. 
We have demonstrated key aspects of the feasibility and 
effectiveness of this direction and hope this this motivates 
serious consideration of privacy as a requirement in the 
development of other next-generation routing protocols. 

We do not advocate an Internet without identity. Rather, 
we propose that identity exposure be reserved for substan- 
tial relationships and avoided for ephemeral relationships. 
This research direction is still in its infancy, and much work 
remains. We believe the most pressing matter is to develop 
diversity in the solution space by considering how privacy- 
preserving features might be integrated into other leading 
layer 3 proposals. Beyond this, there are user interface 
questions to explore in the provision of clear and mean- 
ingful network privacy choices and in characterizing the 
user response to these choices. This characterization will 
provide a better understanding of the resource requirements 
for network layer anonymity systems and of the motivations 
for service providers. Finally, further work remains with the 
Dovetail protocol. Network latency information should be 
incorporated into both the path selection algorithms and the 
anonymity assessments. Our evaluation should be extended 
to a wider range of network assumptions and additional 
privacy/performance settings. 
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